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^ Abstract 

In this paper we suggest a new algorithm for the computation of a best rank 
one approximation of tensors, called alternating singular value decomposition. 
This method is based on the computation of maximal singular values and the 
corresponding singular vectors of matrices. We also introduce a modification 
for this method and the alternating least squares method, which ensures that 
alternating iterations will always converge to a semi-maximal point. (A critical 
point in several vector variables is semi-maximal if it is maximal with respect 
to each vector variable, while other vector variables are kept fixed.) We present 
several numerical examples that illustrate the computational performance of 
^ | the new method in comparison to the alternating least square method. 
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In this paper we consider the best rank one approximation to real d-mode tensors 

T = G pix-x™^ e _ ; d-dimensional arrays with real entries. 

As usual when studying tensors, it is necessary to introduce some notation. 

Setting [m] = {1, . . . ,m} for a positive integer m, for two d-mode tensors T,S G 
Rmi x...xm d we J enote ^ 

(T,«S) := ^2 tii,...,i d s h,...,i d 

the standard inner product of T, S, viewed as vectors in W 111 ' 1712 '■■■' m d. For an integer 
p < d, r G [p] and for Xj r = [xi Jr , . . . , x mjri j r ] T G M™*, we use the standard 
mathematical notation 

®weM x > : = x ii ® ■ ■ ■ ®*3v = [*ii,-,<p] G M m ^ x - Xm *, t iu ..., ip = x iujl . ..x ipdp . 
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(See for example Chapter 5]. In |12j x®y is denoted as xoy and is called vector 
outer product.) 

For a subset P = {ji, . . . ,j p } C [d] of cardinality p = \P\, consider a p-mode 
tensor X = [x^ ^ ] G R m h x - xm i P : where ji < ... < j p . Then we have that 
T x X := Ei Jr e[ m , r ],re[p] U 1 ,...,i d Xi n ,...,i jp is a (d - p)-mode tensor obtained by con- 
traction on the indices ... ,ij p . For example, if T = [ti,j,k] G ]R mxrix ' and x = 
[xi,. . . ,x m ] T G M m ,z = [zi, . . . ,zi] T G R l , then fx (xfgiz) := Ei 6 [m],fce[J] hj,k x i z k, 
and it is viewed as a column vector in R n . Note that for T, S G ]R miX --- xm < i ) we have 
(T,S)=TxS. 

For x G M n we denote by ||x|| the Euclidian norm and for A G R mxn by ||A|| = 
max|| x || =1 ||Ax|| the associated operator norm. Then it is well-known, see e. g. [8], 
that the best rank one approximation of A is given by oxUiV^, where o\ = \\A\\ is 
the largest singular value of A, and m, vj are the associated left and right singular 
vectors. Since the singular vectors have Euclidian norm 1, we have that the spectral 
norm of the best rank one approximation is equal to a\ = \\A\\. 

To extend this property to tensors, let us for simplicity of exposition restrict 
ourselves in this introduction to the case of 3-mode tensors T G M mxnx '. Denote 
by S m_1 := {x G M m , ||x|| = 1} the unit sphere in R m , by S(m,n,l) the set 
gm— l x gn— 1 x gZ— anc j introduce for (x, y, z) G S(m, n, I) the function /(x, y, z) := 
(T, x ® y ® z) . Then computing the best rank one approximation to T is equivalent 
to finding 

max /(x,y,z) = /(x*,y*,z*). (1.1) 

(x,y,z)6S(m,n,Z) 

The tensor version of the singular value relationship takes the form, see |14j . 

T x (y ® z) = Ax, T x (x (8) z) = Ay, T x (x (8) y) = Az, (1.2) 

where ||x|| = ||y|| = ||z|| = 1 and A is a singular value of T ■ 

Let us introduce for p G {1,2} the concept of a p- semi-maximum of f restricted 
to S(m,n,l). For p = 1, the p-semi-maximal points x* , y* , z* of / are the global 
maxima for the three functions /(x, y*, z*), /(x*, y, z*), /(x*,y*,z) restricted to 
gm— g"-- 1 ^ S i_1 , respectively. For p = 2, the p-semi maximal points are the 
pairs (y*,z*), (x*,z*), (x*,y*) that are global maxima of the functions /(x*,y, z), 
/(x,y*,z), /(x,y,z*) on S"" 1 x S^ 1 , S™" 1 x S^ 1 , S™" 1 x S™" 1 , respectively. We 
call (x*, y*, z*) a semi-maximum if it is a p-semi-maximum for p = 1 or p = 2, and 
it is clear how this concept of p-semi-maxima extends to arbitrary d-mode tensors 
with p = 1,2, . . . ,d — 1. In Appendix we discuss in detail 1-local semi- maximal 
points of functions. 



Many approaches for finding the maximum in (1.1) have been studied in the 
literature, see e. g. [12]. An important method, the standard alternating least 
square (ALS) method, is an iterative procedure that starts with xo G S m_1 ,yo G 
S n_1 ,zo G S /_1 , where /(xo,yo,zo) 7^ and then defines the iterates Xj,yj,Zj via 

T x (jj-i g_zj-i) T x (xj_g_Zj_i) _ Tx(xj8 yj) , . 

Xl " ||rx(y i _ 1 ®z i _ 1 )|| ! yi " ||Tx(x i g)z i _ 1 )||' Zl ~ \\Tx(x i ®y i )\\' 1 ' j 

for z = 1, 2, . . . , . 

Note that for alii G N we have 

/(xi_i,yi_i,Zi_i) < /(xi,yi_i,Zi_i) < /(x;, y*, Zf_i) < /(x^y^Zj), 
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i. e., /(xj,yj,Zj) is monotonically increasing and thus converges to a limit, since 
/ is bounded. Typically, (xj,yj,Zj) will converge to a semi-maximum (x, y,z) that 



satisfies (1.2), however this is not clear in general [12|. To overcome this deficiency 
of the ALS and related methods is one of the results of this paper. 

We first discuss an alternative to the ALS algorithm for finding the maximum 



(1.1), where each time we fix only one variable and maximize on the other two. 
Such a maximization is equivalent to finding the maximal singular value and the 
corresponding left and right singular vectors of a suitable matrix, which is a well- 
established computational procedure, [8]. We call this method the alternating sin- 
gular value decomposition (ASVD). Next we introduce modifications of both ALS 
and ASVD, that are computationally more expensive, but for which it is guaranteed 
that they will always converge to a semi-maximum of /. 

Our numerical experimentation do not show clearly that ASVD is always better 
than ALS. Since the standard algorithm for computing the maximal singular value 
of a matrix is a truncated SVD algorithm [8], and not ALS, we believe that ASVD 
is a very valid option in finding best rank one approximations of tensors. 

The content of the paper is as follows. In section [2] we recall some basic facts 
about tensors and best rank one approximations. In section [3] we recall the ALS 
method and introduce the ASVD procedure. The modification of these methods 
to guarantee convergence to a semi-maximum is introduced in section [4] and the 
performance of the new methods is illustrated in section [5] In section [6] we state 
the conclusions of the paper. In an Appendix we discuss the notion of local semi- 
maximality, give examples and discuss conditions for which ALS converges to a local 
semi-maximal point. 

2 Basic facts on best rank one approximations of d- 
mode tensors 

In this section we present further notation and recall some known results about best 
rank one approximations. 

For a d-mode tensor T = [Ui,...,i d ] G R m ^ x - Xm d , denote by ||T|| := \/(T,T) the 
Hilbert-Schmidt norm. Denote by S(m) the d-product of the sub-spheres S mi_1 x 
... x S md , let (xi,...,Xd) £ S(m) and associate with (xi,...,Xd) the d one 
dimensional subspaces U< = span(xj), i G [d]. Note that 

II ®ie[d] x i|l = II IHI = L 

ie[d\ 

The projection P(g, iG[d] u l (T) of T onto the one dimensional subspace U := Oigj^Uj C 
® ie[d] R nH , is given by 

/(xi,...,x d )<g)j e [ d ]Xi, /(xi,...,x d ) := (T,<g>i 6 [d]Xi}, (xi,...,x d ) G S(m). (2.1) 

Denoting by P(® ie[d] u i ) ± 0~) the orthogonal projection of T onto the orthogonal 
complement of <8>i e [d]Uj, the Pythagoras identity yields that 

IIT|| 2 = ll^ eMU! (r)|| 2 + ||p (0i6[d]Ui) x(r)|| 2 . (2.2) 
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With this notation, the best rank one approximation of T from S(m) is given by 



min min 1 1 7" — a <8>,-p rrfi x$ 1 1 . 

(xi,...,x d )eS(m) aeK 1 1 

Observing that 

mm\\T-a® im yn\\ = \\T - P 9ieVlv . (T)\\ = \\P(® imVi )±(T)\\, 

it follows that the best rank one approximation is obtained by the minimization of 
H-^We[d]Ui)- L (7")ll- I n view of (2.2) we deduce that best rank one approximation is 
obtained by the maximization of ||-F® <e um. (7~)ll and finally, using (2.1), it follows 
that the best rank one approximation is given by 

cri(T) := max /(xi, . . . , x d ). (2.3) 

(xi,...,x d )eS(m) 

Following the matrix case, in [9] o"i(7~) is called the spectral norm and it is also 
shown that the computation of o"i(T) in general is NP-hard for d > 2. 

We will make use of the following result of p3], where we present the proof for 
completeness. 



Lemma 1 For T £ ygmix...xm d ^ ^ e cr itical points o//|g( m ), defined in (2.1), 
satisfy the equations 

T x (<S>ie[d]\{O x i) = Ax « foralli£[d], (xi, . . . ,x d ) G S(m), (2.4) 
/or some reaZ A. 

Proof. We need to find the critical points of (T, ^jeHiXj) where (xi, . . . , x^) 6 
S(m). Using Lagrange multipliers we consider the auxiliary function 

c/(xi,...,x d ) := (T,®j e [d]Xj) - ^ A,x ; ' x y . 

The critical points of g then satisfy 

7" x (®j 6 [d]\{i}Xj) = AtXi, i e [d], 



and hence (T, ®je[d]X 3 -) = AjX i x, = Aj for all i G [d] which implies (2.4). □ 



Observe next that (xi, . . . , x^) satisfies (2.4) iff the vectors (±xi, . . . , ±x^) satisfy 



(2.4). In particular, we could choose the signs in (±xi, . . . , ±x^) such that each 
corresponding A is nonnegative and then these A can be interpreted as the singular 
values of 7". The maximal singular value of T is denoted by o~i(T) and is given by 
(2.3). Note that to each nonnegative singular value there are at least 2 d ~ 1 singular 
vectors of the form (ixi, . . . , rtx^). So it is more natural to view the singular vectors 
as one dimensional subspaces Uj = span(xj), i € [d]. 

As observed in [5] for tensors, i. e., for d > 2, there is a one-to-one correspondence 
between the singular vectors corresponding to positive singular values of T and the 
fixed points of an induced multilinear map of degree d — 1. 
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Lemma 2 Let d > 2 and assume that T G ^rnix...xm d _ Associate with T the 
map F from E(m) := M™ 1 x . . . x M md to itself, where 

F := (Fi,...,F d ), Fi(ui,...,u d ) :=Tx (®j 6 [d]\{i}^)» * G M- 

TTien i/iere is a one-to-one correspondence between the critical points o//|g( m ) cor- 
responding to positive singular values A and £/ie nonzero fixed points of 

F(u) = u. (2.5) 



Namely, each (xi, . . . , x^) G S(m) satisfying (2.4) wzf/i A > induces a fixed point 
of F of the form 

-i 

(ui,...,u d ) = A<f-2(xi,...,x d ). 



Conversely, any nonzero fixed point satisfying (2.5) induces a d-set of singular vec- 
tors (xi,...,X(i) = jj^-jj (ui, . . . , Urf) € S(m) corresponding to A = ||ui||~( d ~ 2 ). in 
particular, the spectral norm o"i(T) corresponds to a nonzero fixed point o/F closest 
to the origin. 



. . . . d-l 

Proof. Assume that (2.4) holds for A > 0. Dividing both sides of (2.4) by A d ~ 2 
— _ x 

we obtain that (ui, . . . , ty) = A d ~ 2 (xi, . . . , x^) is a nonzero fixed point of F. 

For the converse, assume that (ui, . . . , iid) is a nonzero fixed point of F. Clearly 
u^Uj = (T, Xje[d]Uj) for i G [d]. Hence, ||ui|| = . . . = \\ud\\ > and (xi, . . . ,x rf ) = 



^(ui,...,^) G S(m) satisfies with A = ||ui||~(^ 2 ). 



The largest positive singular value of T corresponds to the nonzero fixed point 
(ui, . . . , iid), where Yli^id] ll u *ll 2 = ^ll u ill 2 i s the smallest. □ 
We also have that the trivial fixed point is isolated. 

Proposition 3 The origin G M(m) is an isolated simple fixed point o/F. 

Proof. A fixed point of F satisfies 

u-F(u) = (2.6) 

and clearly, u = satisfies this system. Observe that the Jacobian matrix D(u — 
F(u))(0) is the identity matrix. Hence the implicit function theorem yields that 



is a simple isolated solution of (2.5). □ 



In [7] the following results are established. First, for a generic T G j£ m i x --- xm d 
the best rank one approximation of T is unique. Second, a complex generic T G 
Qm 1 x...xm d a f] n ite number v(m\, . . . ,md) of singular value tuples and the cor- 
responding "singular complex values" A. We now consider the "cube" case where 
mi = . . . = m<2 = to. Then u(m, . . . , m) is different from the number of complex 
eigenvalues computed in [lj. Finally, for a generic symmetric tensor T G j£ mx — x™^ 
the best rank rank approximation is unique and symmetric. (The fact that the best 
rank one approximation of a symmetric tensor can be chosen symmetric is proved 
in 0.) 
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3 The ALS and the ASVD method 



In this section we briefly recall the alternating least squares (ALS) method and 
suggest an analogous alternating singular value decomposition (ASVD) method. 

Consider T G M m i x --- xm d\{0} and choose an initial point (x^o, • • . , x^o) G S(m) 
such that /(xi 5 o, . . . , x^o) 7^ 0. This can be done in different ways. One possibility is 
to choose (xi 5 o, • • • , x<jo) G S(m) at random. This will ensure that with probability 
one we have /(x^o, • • • , x<£o) 0- Another, more expensive way to obtain such an 
initial point (x^o, . . • , x^o) is to use the higher order singular value decomposition 
(HOSVD) 0. To choose x ii0 view T as an m, x miX m - xmd ma trix A*, by unfolding 
in direction i. Then Xj is the left singular vector corresponding to (j\(Ai) for i G [d]. 
The use of the HOSVD is expensive, but may result in a better choice of the initial 
point. 

Given (xi iP , . . . , X-d,p) G S(m), for an integer p > the points Xj iP+ i G S mi_1 are 
then computed recursively via 



for % G [d]. Clearly, we have the inequality 

/(Xl,p+1 , • • • , Xj—^p-j-i , Xj p, . . . , ~X-d,p) < /(xi ; p_)_i , . . . , Xj 5 p+i , Xj+^p, . . . , X^ p) , 

for i G [d] and p > 0, and the sequence /(xi )P , . . . , x^ p ),p = 0, 1, ... is a nondecreas- 
ing sequence bounded by ci(T), and hence it converges. 

Recall that a point (x^*, . . . , x^*) G S(m) is called a 1-semi maximum, if Xj * is a 
maximum for the function f(x.\ * : . . . , x^_ x,*j X2, x^-j-i^, . . . , x^* ) restricted to S mi 
for each % G [d]. Thus, clearly any 1-semi maximal point of / is a critical point of /. 
In many cases the sequence (xi )P , . . . , Xd lP ),p = 0, 1, . . . does converge to a 1-semi 
maximal point of /, however, this is not always guaranteed 112;. 

An alternative to the ALS method is the alternating singular value decomposition 
(ASVD). To introduce this method, denote for A G W mx£ by u(A) G S m ~ 1 ,v(A) G 
S^ 1 the left and the right singular vectors of A corresponding to the maximal 
singular value a\(A), i. e., 

u{A) T A = a l (A)v{A) T , Av{A) = <n(A)u(A). 

Since for d = 2 the singular value decomposition directly gives the best rank one 
approximation, we only consider the case d > 3. Let T = [U lt ... i d ] G K m i x --- Xm d and 
X := (xi, . . . ,Xd) G S(m) be such that /(xi, . . . , x<i) 7^ 0. Fix an index pair (i,j) 
with 1 < % < j < d and denote by Xij the d — 2 tensor <8>fceui\{i J j}Xfc. Then T x X^j 
is an rrii x rrij matrix. 

The basic step in the ASVD method is the substitution 

(x^xj) 1 y (u(T x Xij), v(T x Xij)). (3.1) 

For example, if d = 3 then the ASVD method is given by repeating iteratively the 



substitution (3.1) in the order 

(2,3), (1,3), (1,2). 
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For d > 3, one goes consecutively through all ( 2 ) pairs in an "evenly distributed 
way" . For example, if d = 4 then one could choose the order 

(1,2), (3,4), (1,3), (2,4), (1,4), (2,3). 



Observe that the substitution (3.1 ) gives (7i(TxAfjj). Note that the ALS method 
for the bilinear form g(x, y) = x 1 (T x <-fij)y could increase the value of g at most 
to its maximum, which is o~i(T x Afjj). Hence we have the following proposition. 

Proposition 4 Let T G W n ^ x --- Xm d \{0} and assume that (xi, . . . ,x d ) G S(m). 
Fix 1 < i < j < d and consider the following three maximization problems. First, fix 
all variables except the variable x p and denote the maximum of /(xi, . . . , x^) over 
x p G S mp_1 by a p . Then find ai,aj. Next fix all the variables except Xj,Xj and find 
the maximum of /(xi, . . . , x<j) over (xj,Xj) G S m * x S m * , which is denoted by 
bij. Then bij > max(aj, aj). In particular one step in the AS VD increases the value 
of f as least as much as a corresponding step of ALS. 

The procedure to compute the largest singular value of a large scale matrix is based 
on the Lanczos algorithm |8j implemented in the partial singular value decomposi- 
tion. Despite the fact that this procedure is very efficient, for tensors each iteration 



of ALS is still much cheaper to perform than one iteration of (3.1). However, it 
is not really necessary to iterate the partial SVD algorithm to full convergence of 
the largest singular value. Since the Lanczos algorithm converges rapidly [8], a few 
steps (giving only a rough approximation) may be enough to get an improvement 
in the outer iteration. In this case, the ASVD method may even be faster than the 
ALS method, however, a complete analysis of such an inner-outer iteration is an 
open problem. As in the ALS method, it may happen that a step of the ASVD 
will not decrease the value of the function /, but in many cases the algorithm will 
converge to a semi-maximum of /. However, as in the case of the ALS method, we 
do not have a complete understanding when this will happen. For this reason, in 
the next section we suggest a modification of both ALS and ASVD method, that 
will guarantee convergence. 



4 Modified ALS and ASVD 

The aim of this section is to introduce modified ALS and ASVD methods, abbre- 
viated here as MALS and MASVD. These modified algorithms ensure that every 
accumulation point of these algorithms is a semi-maximal point of /|s(m)- For sim- 
plicity of the exposition we describe the concept for the case d = 3, i. e., we assume 
that we have a tensor T G R mxnxZ . 

We first discuss the MALS. For given (x, y,z) G S(m,n,l) with /(x, y, z) ^ 0, 
the procedure requires to compute the three values 

/l(x ' y ' 2) := Vx(y^z)||' y ' Z) ' 

ft \ ft Tx (x0z) ^ 

/ 2 (x,y,z) := /(x 



/3(x,y,z) := /(x,y 



T x (x<g)z)||' 
T x (x® y) 



|T x (x®y) 
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and to choose the maximum value. This needs 3 evaluations of /. 

The modified ALS procedure then is as follows. Let (xo,yo,zo) G S(m, n, I) 
and /(xo,yo,zo) 7^ 0. Consider the maximum value of /j(xo,yo,zo) for i = 1,2,3. 
Assume for example that this value is achieved for i = 2 and let yi := j|^x(xocg>zo)|| • 
Then we replace the point (xo,yo,zo) with the new point (xo,yi,zo) and consider 
the maximum value of /i(xo, yi, zq) for i = 1, 2, 3. This needs only two / evaluations, 
since /2(xo,yo,zo) = f 2(^-0, yi, zo). Suppose that this maximum is achieved for 
i = l. We then replace the point in the triple (xo,yi,zo) with (xi,yi,zo) where 

xi = 117-x (yi^zo)ll an< ^ ^hen as ^ ne ^ as ^ s ^ e P we °pti mrze the missing mode, which is 
in this example i = 3. In case that the convergence criterion is not yet satisfied, we 
continue iteratively in the same manner. The cost of this algorithm is about twice 
as much as that of ALS. 

For the modified ASVD we have a similar algorithm. For (x,y,z) G S(m, n, I), 
/(x,y,z)/0, let 



#i(x,y,z) 
g 2 (x,y,z) 
ff3(x,y,z) 



/(x,u(T xx),v(T xx)), 
/(u(Txy),y,v(Txy)), 
/(u(T x z),v(T x z),z), 



which requires three evaluations of/. Let (xo, yo, zo) G S(m, n, I) and /(xo, yo, zo) 7^ 
and consider the maximal value of <7i(xo, yo, zo) for i = 1, 2, 3. Assume for example 
that this value is achieved for i = 2. Let xi := u(T x yo),zi := v(7~ x yo). Then 
we replace the point (xo,yo,zo) with the new point (xi,yo,zi) and determine the 
maximal value of <?i(xi, yo, zi) for i = 1,2,3. Suppose that this maximum is achieved 
for i = 1. We then replace the point in the triple (xi,yo,zi) with (xi,yi,Z2) where 
yi = u(T x xi),Z2 = v(7" x Xx) and if the convergence criterion is not met then 
we continue in the same manner. This algorithm is about twice as expensive as the 
ASVD method. For d = 3, we then have the following theorem. 

Theorem 5 Let T G R mxnxl be a given tensor and consider the sequence 

(xj, y i5 G S(m, n, I) for i = 0,1, ... , (4.1) 

generated either by MALS or MASVD, where /(xo,yo,zo) 7^ 0. If (x*,y*,z*) G 
S(m,n,Z) is an accumulation point of this sequence, then (x* , y* , z* ) G S(m, n, /) is 



a 1-semi maximum if (4.1) is given by MALS and a 2-semi maximum if (4.1) is 
given by MASVD. 

Proof. Let (x* , y* , z* ) G S(m, n, I) be an accumulation point of the sequence 



(4.1), i.e., there exists a subsequence 1 < n\ < 712 < n$ < . . . such that 
lim J ^ 00 (x nj .,y n3 ,z nj ) = (x*, y*, z#). Since the sequence /(x^y^Zj) is nondecreas- 
ing, we deduce that limj_ s . 00 /(xj, y^, Zj) = /(x*, y*, z*) > 0. By the definition of 
fi(x.*, y*, z*) it follows that 

min{/,-(x*,y*,z*), j = 1,2,3} > /(x*,y*,z*). (4.2) 

Assume first that the sequence (|4.1|) is obtained by either ALS and MALS. We will 



point out exactly, where we need the assumption that (4.1) is obtained by MALS 



to deduce that (x* , y* , z* ) G S(m, n, I) is a 1-semi maximum. 
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Consider first the ALS sequence given as in (1.3). Then 



/(xi,yi_i,Zi_i) = /i(xi_i,yi_i,Z£_i) 

< /(xi,yi,Zj_i) = / 2 (xi,yj_i,Zj_i) 

< /(xi,yi,Zi) = /3(xi,yi,Zi_i). (4.3) 

For any e > 0, since /i(x, y, z) is a continuous function on S(m, n, I), it foilows that 
for a sufficiently large integer j that /i (x nj , y nj , z Uj ) > /i (x* , y* , z* ) — e. Hence 

/(x*,y*,z*) > /(x„ 3+ i,y nj+ i,y nj+ i) > /i(x nj+ i, y Uj , z nj ) > /i(x*, y*, z*) - e. 

(4-4) 

Since e > can be chosen arbitrarily small, we can combine inequality (4.4) with 



( |4.2[ ) to deduce that /i(x*, y*, z*) = /(x*, y*, z*). We can also derive the equality 
/3(x*, y*, z*) = /(x*, y*, z*) as follows. Clearly, 

/(Xrij i y?ij i Zjij — l) ^ fsip^rij > y«j > z nj — l) = /(Xrij ) ynj j z rij ) ^ /(x n ^ +1 , y«j +1 , z n J+ i ) 

Using the same arguments as for /i we deduce the equality /3 (x* , y* , z* ) = /(x* , y* , z* ] 
However, there is no way to deduce equality in the inequality (x* , y* , z* ) > 
/(x*, y*, z*) for the ALS method, since /2(xj,y,Zj) = /(xj, Uj, Zj) and Uj is not 
equal to y» or y i+1 . 

We now consider the case of M ALS. We always have the inequalities /j(xj, y^, Zj) < 
/(xj + i, yj+i, Zj + i) for each j = 1, 2, 3 and i € N. Then the same arguments as before 
imply in a straightforward way that we have equalities in (4.2). Hence (x*,y*,z*) 
is a 1-semi maximum. 

Similar arguments show that if the sequence (4.1) is obtained by MASVD then 
5A;(x*, y*, z*) = /(x*, y*, z*) for fc € [3]. Hence (x* , y* , z* ) is a 2-semi maximum. 

□ 

It is easy to accelerate the convergence of the MALS and MASVD algorithm in the 
neighborhood of a semi-maximum via Newton's method, see e.g. [18] . 

The following questions remain open. Suppose that the assumptions of Theo- 
rem[5]hold. Assume further, that one accumulation point (x*, y*, z*) of the sequence 
(4.1) is an isolated critical point of f\s(m,n,i)- I s it true that for the MALS method 
the sequence (4.1) converges to (x* , y* , z* ) , where we identify — x*, — y*, — z* with 
x* , y* , z* respectively? Is the same claim true for the MASVD method assuming 
the additional condition 



01 (T x x*) > a 2 (T x x*), 01 (T x y„) > a 2 (T x y*), a\(T x z*) > a 2 {T x z*)? 



5 Numerical results 

We have implemented a C++ library supporting the rank one tensor decomposition 
using vmmlib [16] . LAPACK and BLAS in order to test the performance of the 
different best rank one approximation algorithms. The performance was measured 
via the actual CPU-time (seconds) needed to compute the approximate best rank 
one decomposition, by the number of optimization calls needed, and whether a 
stationary point was found. All performance tests have been carried out on a 2.8 
GHz Quad-Core Intel Xeon Macintosh computer with 16GB RAM. 
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The performance results are discussed for synthetic and real data sets of third- 
order tensors. In particular, we worked with three different data sets: (1) a real 
computer tomography (CT) data set called MELANIX (an Osirix data set), (2) a 
symmetric random data set, where all indices are symmetric, and (3) a random data 
set. The CT data set has a 16bit, the random data set an 8bit value range. All our 
third-order tensor data sets are initially of size 512 x 512 x 512, which we gradually 
reduced by a factor of 2, with the smallest data sets being of size 4x4x4. The 
synthetic random data sets were generated for every resolution and in every run; 
the real data set was averaged (subsampled) for every coarser resolution. 

Our simulation results are averaged over different decomposition runs of the 
various algorithms. In each decomposition run, we changed the initial guess, i.e., we 
generated new random start vectors. We always initialized the algorithms by random 
start vectors, since this is cheaper than the initialization via HOSVD. Additionally, 
we generated for each decomposition run new random data sets. The presented 
timings are averages over 10 different runs of the algorithms. 

All the best rank one approximation algorithms are alternating algorithms, and 
based on the same convergence criterion, where convergence is achieved if one of 
the two following conditions: iterations > 10; fitchange < 0.0001 is met. The 
number of optimization calls within one iteration is fixed for the ALS and ASVD 
method. During one iteration, the ALS optimizes every mode once, while the ASVD 
optimizes every mode twice. The number of optimization calls can vary widely 
during each iteration of the modified algorithms. This results from the fact that 
multiple optimizations are performed in parallel, while only the best one is kept and 
the others are rejected. 

The partial SVD is implemented by applying a symmetric eigenvalue decompo- 
sition (LAPACK DSYEVX) to the product AA T (BLAS DGEMM) as suggested by 
the ARPACK package. 

The first observation regarding the average timings is that third-order tensors 
(tensor3s) of size 64 3 or smaller took less than one second for the decomposition, 
which represents a time range, where we do not need to optimize further. On the 
contrary, the larger the third-order tensor gets, the more critical the differences in 
the decomposition times are. As shown in Figure [TJ the modified versions of the 
algorithms took about twice as much CPU-time as the standard versions. For the 
large data sets, the ALS and ASVD take generally less time than the MALS and 
MASVD. The ASVD was fastest for large data sets, but compared to (M)ALS slow 
for small data sets. For larger data sets, the timings of the basic and modified 
algorithm versions came closer to each other. 

We also compared the number of optimization calls needed for the algorithms 
of ALS, ASVD, MALS, and MASVD, recalling that for the ALS and the MALS, 
one mode is optimized per optimization call, while for ASVD and MASVD, two 
modes are optimized per optimization call. Figure 2 demonstrates the relationships 
of the four algorithms on three different data sets (color and marker encoded). 
As can be seen, the ASVD has the smallest number of optimization calls followed 
by the ALS, the MASVD and the MALS. One notices as well that the number 
of optimization calls for the two random data sets are close to each other for the 
respective algorithms. The real data set takes most optimization calls, even though 
it probably profits from more potential correlations. However, the larger number of 
optimization calls may also result from the different precision of one element of the 
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melanix-64 symmetric-64 random-64 melanix-32 symmetric-32 random-32 
ALS ASVD "MALS ■ MASVD tensor3 sample 

(a) CPU time (s) for medium 3-mode tensor samples 

sec n 




melanix-512 symmetric-512 random-512 melanix-256 symmetric-256 random-256 
als asvd ■ mals ■ masvd tensor3 sample 

(b) CPU time (s) for larger 3-mode tensor samples 

Figure 1: Average CPU times for best rank one approximations per algorithm and 
per data set taken over 10 different initial random guesses. 
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third-order tensor (16bit vs. 8bit values). Another explanation might be that it was 
difficult to find good rank one bases for a real data set (the error is approx. 70% 
for the 512 3 tensor). For random data, the error stays around 63%, probably due to 
a good distribution of the random values. Otherwise, the number of optimization 
calls followed the same relationships as already seen in the timings measured for the 
rank one approximation algorithm. 




ALS-melanix ALS-symmetric ALS-random tenSOr3 Size 

— H — ASVD-melanix — A — ASVD-symmetric — 6 — ASVD-random 

■■£)■■ MALS-melanix MALS-symmetric ■■©■• MALS-random 

-□ -MASVD-melanix -A -M ASVD-symmetric -G -MASVD-random 



Figure 2: Average number of optimization calls needed per algorithm and per data 
set taken over 10 different initial random guesses. 



It is not only important to check how fast the different algorithms perform, 
but also what quality they achieve. This was measured by checking the Frobenius 
norm of the resulting decompositions, which serves as a measure for the quality of 
the approximation. In general, we can say that the higher the Frobenius norm, 
the more likely it is that we find a global maximum. Accordingly, we compared 
the Frobenius norms in order to say whether the different algorithms converged to 
the same stationary point. In Table 1, we show the average Frobenius norm of 
the input tensor3 and compare it to the Frobenius norms of the approximations 
by ALS, ASVD, MALS, and MASVD. We observed that all the algorithms reach 
the same stationary point for the smaller and medium data sets. The computed 
Frobenius norms have the same value except for the ASVD result of the symmetric 
4x4x4 data set, where the final Frobenius norm is much higher. However, for 
the larger data sets (> 128 3 ) the stationary points differ slightly. We suspect that 
either the same stationary point was not achieved, or the precision requirement of 
the convergence criterion was too high. That means that the algorithms stopped 
earlier, since the results are not changing that much anymore in the case that the 
precision tolerance for convergence is 0.0001. 

The results of best rank one approximation for symmetric tensors using ALS, 
MALS, ASVD and MASVD show that the best rank one approximation is also 
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symmetric, i.e., is of the form ou® v®w, where u ~ v ~ w E S m . This confirms 
an observation made by Paul Van Dooren, (private communication), and the main 
result in [5], which claims that the best rank one approximation of a symmetric 
tensor can be always chosen symmetric. The results of ASVD an MASVD give a 
better symmetric rank one approximation, i.e., u — v, u — w in ASVD and MASVD 
are smaller than in ALS and MALS. 





Input 


ALS 


ASVD 


MALS 


MASVD 


melanix-512 


4038720 


2845680 


2839270 


2845680 


2840890 


melanix-256 


1416420 


1006170 


1004060 


1006160 


1004360 


melanix-128 


490619 


355487 


353809 


355487 


355360 


melanix-64 


167341 


125566 


125566 


125566 


125566 


melanix-32 


56415 


44066 


44066 


44066 


44066 


melanix-16 


18667 


15393 


15393 


15393 


15393 


melanix-8 


5925 


5261 


5261 


5261 


5261 


melanix-4 


1674 


1619 


1619 


1619 


1619 


symmetric-512 


1700600 


1471290 


1435540 


1471290 


1438110 


symmetric-256 


601363 


520329 


507681 


520329 


508588 


symmetric- 128 


212602 


183949 


179476 


183949 


179797 


symmetric-64 


75077 


64969 


64969 


64969 


64969 


symmetric-32 


26619 


23068 


23068 


23068 


23068 


symmetric-16 


9417 


8184 


8184 


8184 


8184 


symmetric-8 


3198 


2759 


2759 


2759 


2759 


symmetric-4 


1133 


876 


945 


880 


945 


random-512 


1700610 


1471340 


1435570 


1471370 


1438140 


random-256 


601277 


520217 


507576 


520218 


508487 


random- 128 


212581 


183926 


179459 


183926 


179783 


random-64 


75170 


65056 


65056 


65056 


65056 


random-32 


26608 


23045 


23045 


23045 


23045 


random- 16 


9423 


8173 


8173 


8173 


8173 


random-8 


3330 


2895 


2895 


2895 


2895 


random-4 


1156 


1017 


1017 


1017 


1017 



Table 1: Average Frobenius norms: Initial Frobenius norm vs. the Frobenius norm 
of the approximations per algorithm and per data set (average taken over 10 different 
initial random guesses). 



6 Conclusions 

We have presented a new alternating algorithm for the computation of the best rank 
one approximation to a d-mode tensor. In contrast to the alternating least squares 
method, this method uses a singular value decomposition in each step. In order to 
achieve guaranteed convergence to a semi-maximal point, we have modified both 
algorithms. We have run extensive numerical tests to show the performance and 
convergence behavior of the new methods. 
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Appendix: Remarks on local semi-maximality 

In this appendix we discuss the notion of an isolated critical point of a function / 
which is semi-maximal but not maximal. The main emphasize is to characterize 
semi-maximal points for which the alternating maximization iteration, abbreviated 
as AMI, converges to the critical point at least for some nontrivial choices of the 
starting points. We explain the convergence issues for ALS on local semi-maximality 
by the help of the AMI. 

Consider a polynomial function p(t), t E M. N and let M C M. N be a smooth 
compact manifold of dimension L. Denote by g the restriction of p to M. For 
example, in the three mode case we let N = m + n + I, t = (x,y,z), p(t) = 
T x (x <g> y <g> z) and M = S m_1 x S n_1 x S*" 1 , L = N - 3. Assume that a point 
t* £ M is a non-degenerate critical point of g on M. We take local coordinates 
around t*, so that in these local coordinates corresponds to the zero vector of 
dimension L, denoted as 0^. So the open connected neighborhood of t* is identified 
with an open connected neighborhood 0^ G M. L . Assume that the local coordinates 



around l are x = [x 



i ' 



The AMI method consists of maximizing g (or /) on x,- for j = 1, . . . , d, and 
then repeating the process. Let us discuss the details of the AMI for a function / 



given by a quadratic form in the block vector x = [x 



l > ■ 



G 



given by 



/ 



-x 1 Hx, H 



#1,1 #1,2 
#2,1 #2,2 



#<*,: 



# 



H p, q = #?,p> p,q 6 [d]- 



d,d— 1 



#l,d 
#2,d 

HdA 



(6.1) 



Note that locally we obtain this form for general / via Taylor expansion and leaving 
off terms of order higher than two. 



/,— !■•••' €d,k- J 



Consider the AMI iteration £fc_i 
l L for a function / of the form (6.1) starting from a point £o- Then this iteration is 



the block Gaufi-Seidel iteration, see e.g. [T7], applied to the linear system — H£ = 
with the block symmetric matrix H, i.e., 



1=1 £=j+l 



j = i,...,d, keN. 



(6.2) 



This iterative method can be expressed as —Ln^k = #ff6e-l) where H = Lh + Uh 
is the decomposition of H into the block lower triangular part Lh and the strict 
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block upper triangular part Uh- Assume that Lh is invertible, which is equivalent 
to the requirement that all diagonal blocks Hjj are invertible. Then (6.2) is of the 
form £fc = K£k-i, where 

K := -L$U H . (6.3) 

It is well known that an iteration = K^-i will converge to 0^ for all starting 
vectors £o if and only if the spectral radius of K, denoted as p(K), is less than 1. If 
p{K) > 1 then the iteration will converge to 0^ if and only if £o lies in the invariant 
subspace of K associated with the eigenvalues of modulus less than 1. 

Assume in the following that 0i := [OjL, . . . ,0 m J T is a semi-maximal point, 
i.e., that all diagonal blocks Hjj, j £ [d] of H are positive definite. Then it follows 



from a classical result of Ostrowski, see e.g. [17} Thm 3.12], that the iteration (6.2) 
converges to 0^ if and only if H is positive definite, which is equivalent to p(K) < 1. 
Clearly, in this case 0^ is non-maximal for /(£) if and only if H is indefinite. 

We summarize these observations to give a precise condition on £o so that the 



iteration (6.2) converges to zero. 



Theorem 6 Let 0^ := [0^ , . . . , m J T be a semi-maximal point of /(£) = 
-£ T H{;, i.e., each Ha is positive definite and let K be given by (6.3). Denote 



by a, /3, 7 the number of eigenvalues A of K , counting with multiplicities, satisfying 
|A| < 1, |A| > 1, |A| = 1, respectively. Assume that H has tt,v,C positive, negative 
and zero eigenvalues, respectively. Then 

7r > max{mj,j G [d]}, (6-4) 
a = 7T, P = v, 7 = C- (6-5) 

Furthermore, all 7 eigenvalues of K on the unit circle correspond to a unique eigen- 
value 1 of geometric multiplicity 7. The corresponding eigenvectors of K are the 
eigenvectors of H corresponding to the zero eigenvalue. 



Proof. We first prove (6.4). Let Hi i be the diagonal block of maximal size mj. 
Let H be a principal submatrix of H of order 7x1% + 1 which has Hi^ as its submatrix. 
The Cauchy interlacing theorem [10] implies that the eigenvalues of H interlace with 
the eigenvalues of Hi^. Since all eigenvalues of Ha are positive it follows that H 



has at least m, positive eigenvalues and hence, (6.4) holds. 



To prove (6.5), assume first that £ > 1. But if x is an eigenvector of H corre- 
sponding to the eigenvalue then Kx = x. Hence 7 > £, and 1 is an eigenvalue of 
K of geometric multiplicity at least £. 

Let Vo be the null space of H. Then K restricted to Vo is the identity operator. 
Consider the quotient space Q := M L /Vo. Clearly, K and H induce linear operators 
K\,H\ : Q — > Q, where Hi is nonsingular with ir positive eigenvalues and v negative 
eigenvalues. Observe also that if y, z G IR and y — z 6 Vo then y T Hy = z T Hz. 
Thus, it is enough to study the eigenvalues of K\, which corresponds to the case 
where H is nonsingular, which we assume from now on. 

Observe that the AMI does not decrease the value of /(£). Moreover, = 
if and only if ^k-i = Ol- Let us, for simplicity of notation, consider the 
iteration ^ = in the complex setting, i.e., we consider F(£) = —£*H(; , where 

£ € C L . All the arguments can also be carried out in the real setting, by considering 
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pairs of complex conjugate eigenvalues and the corresponding real invariant subspace 
associated with the real and imaginary part of an eigenvector. 

Let A be an eigenvalue of K and let £o be the eigenvector to A. Then F(£i) = 
|A| 2 F(£o) > F(€o) which implies that |A| 7^ 1. (This implies that the only eigenvalue 
of K of modulus 1 can be the eigenvalue 1, which corresponds to the eigenvalue 
of H.) 

Observe next, that if H is positive definite, then F(£o) < and the inequality 
F(gi) > F(£o) yields that |A| 2 < 1, i.e., p(K) < 1, which is Ostrowski's theorem. 

^From now on we therefore assume that H is indefinite and nonsingular. As- 
sume that F(£o) > and £0 7^ Ol- Then F(£k) is an increasing sequence which 
either diverges to +00 or converges to a positive number. Hence we cannot have 
convergence — > Ol- More precisely, we have convergence — > Ol if and only if 
F(£k) < for all k > 0. 

Let Uo C Ui C C L be the invariant subspaces of K corresponding to the 
eigenvalues and the eigenvalues A of modulus less than 1 of K, respectively. So 
A'Uo C Uo and if |Uo is nilpotent. Let Iq = dimUo- We have that F(£) < for 
all £ G U. Let V_ , V + C C L be the eigen-subspaces corresponding to negative and 
positive eigenvalues of H, respectively. So tt = dim V + , v := dim V_ and ir + v = L. 
Consider W = Range (K L ). Then 

U n W = {0 L }, dim W = L-l , KW = W, W + U = C L . 

With W+ : = W n V+, then we have that dim W + > ir - l and K\ := K\W is 
invertible. Setting Wj = K^ J W + , we have that £j G Wj, and F(K k ^j) < for 
k = 0, . . . ,j, and clearly, dim Wj = dim W + . Since the space of dim W + subspaces 
in C is compact, there exists a subsequence of ~Wj k ,k G N which converges to a 
dim W + dimensional subspace X C C L . This subspace corresponds to the invariant 
subspace of K associated with eigenvalues satisfying < |A| < 1, since 
for all k > and £ G X. Thus, X n U = {0 L } and Ui = X + U . Note that 
dimUi = dimX + dimUo > vr. Since F(£) < for each ^ G Ui ,it follows that 
dimUi = 7r, i.e., a = tt. 

As a + f3 = L, it then follows that f3 = L — a = L — it = v. □ 

As an example, if we apply the ALS method for finding the maximum of the 
trilinear form T x (x y ® z) restricted to (x, y,z) G M = S m_1 x S n_1 x S'~\ 
then this is just the AMI for the local quadratic form g. It is well known that g 
may have several critical points, some of whom are strict local maxima and local 
semi-maxima see Example 2, p. 1331]. The above analysis shows that the ALS 
may converge to each of these points for certain appropriate starting points. For a 
specific T G W nxnxl one can expect that the ALS iteration exhibits a complicated 
dynamics. Hence, it is quite possible that in some cases the ALS method will not 
converge to a unique critical point, see also [31 [T2J [T3] . 
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